Random intersection trees
نویسندگان
چکیده
Finding interactions between variables in large and high-dimensional data sets is often a serious computational challenge. Most approaches build up interaction sets incrementally, adding variables in a greedy fashion. The drawback is that potentially informative high-order interactions may be overlooked. Here, we propose an alternative approach for classification problems with binary predictor variables, called Random Intersection Trees. It works by starting with a maximal interaction that includes all variables, and then gradually removing variables if they fail to appear in randomly chosen observations of a class of interest. We show that informative interactions are retained with high probability, and the computational complexity of our procedure is of order p, where p is the number of predictor variables. The value of κ can reach values as low as 1 for very sparse data; in many more general settings, it will still beat the exponent s obtained when using a brute force search constrained to order s interactions. In addition, by using some new ideas based on min-wise hash schemes, we are able to further reduce the computational cost. Interactions found by our algorithm can be used for predictive modelling in various forms, but they are also often of interest in their own right as useful characterisations of what distinguishes a certain class from others.
منابع مشابه
A New Heuristic Algorithm for Drawing Binary Trees within Arbitrary Polygons Based on Center of Gravity
Graphs have enormous usage in software engineering, network and electrical engineering. In fact graphs drawing is a geometrically representation of information. Among graphs, trees are concentrated because of their ability in hierarchical extension as well as processing VLSI circuit. Many algorithms have been proposed for drawing binary trees within polygons. However these algorithms generate b...
متن کاملIntersection and mixing times for reversible chains
Suppose X and Y are two independent irreducible Markov chains on n states. We consider the intersection time, which is the first time their trajectories intersect. We show for reversible and lazy chains that the total variation mixing time is always upper bounded by the expected intersection time taken over the worst starting states. For random walks on trees we show the two quantities are equi...
متن کامل4 A pr 1 99 9 The Upper Critical Dimension of the Abelian Sandpile Model
The existing estimation of the upper critical dimension of the Abelian Sandpile Model is based on a qualitative consideration of avalanches as self-avoiding branching processes. We find an exact representation of an avalanche as a sequence of spanning sub-trees of two-component spanning trees. Using equivalence between chemical paths on the spanning tree and loop-erased random walks, we reduce ...
متن کاملRandom Intersection Trees for finding interactions in large datasets
Finding interactions between variables in large and high-dimensional datasets is often a serious computational challenge. Because of the huge number of possible interactions, most approaches build up interaction sets incrementally, adding variables in a greedy fashion. In order for this to work, higher order interactions must contain informative lower order interactions. Important examples of t...
متن کاملP´olya Urn Models and Connections to Random Trees: A Review
This paper reviews P´olya urn models and their connection to random trees. Basic results are presented, together with proofs that underly the historical evolution of the accompanying thought process. Extensions and generalizations are given according to chronology: • P´olya-Eggenberger’s urn • Bernard Friedman’s urn • Generalized P´olya urns • Extended urn schemes • Invertible urn schemes ...
متن کاملBranches in random recursive k-ary trees
In this paper, using generalized {polya} urn models we find the expected value of the size of a branch in recursive $k$-ary trees. We also find the expectation of the number of nodes of a given outdegree in a branch of such trees.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of Machine Learning Research
دوره 15 شماره
صفحات -
تاریخ انتشار 2014